nginx max_fails及proxy_next_upstream研究

测试nginx如何根据max_fails和proxy_next_upstream相关配置进行后端server的选择

一、测试环境
nginx配置文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
user  worker;
worker_processes 1;
pid logs/nginx.pid;


events {
worker_connections 1024;
}

http {
include mime.types;
default_type application/octet-stream;

log_format test '"$remote_addr|$upstream_addr" '
'[$time_local] '
'"$request" $status $body_bytes_sent '
'"$cookie_jsessionid" "$http_referer" "$http_user_agent"';
access_log logs/access.log main;

sendfile on;
#tcp_nopush on;

#keepalive_timeout 0;
keepalive_timeout 65;

#gzip on;
upstream backend{
server 10.103.16.30:23456 weight=1 max_fails=3 fail_timeout=30;
server 10.103.16.28:23456 weight=1 max_fails=3 fail_timeout=30;
}

server{
listen 22222;
access_log logs/test.log test;
error_log logs/test.err;
location / {
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_pass http://backend;
proxy_next_upstream error timeout invalid_header http_500,http_404;
}

}
}

nginx所在机器:10.103.16.30
后端服务器列表:
10.103.16.28
10.103.16.30
两个机器的weight一样,正常情况下nginx会交替访问两者。
如下图所示:
enter description here
proxy_next_upstream中配置了http_500和http_404

二、测试过程
1、配置proxy_next_upstream,加上http_500,http_404参数
a、5xx错误测试
10.103.16.30返回500,10.103.16.28返回200
nginx日志结果如下:
enter description here
实验结果说明:
客户端访问16.30失败,返回500错误,失败3次后,nginx不再将请求转发给16.30机器。fail_timeout(30s)时间过后,nginx会再次尝试访问16.30机器,同样得到500错误,之后不再访问16.30机器。如此循环下去。

后端10.103.16.30 http server显示的日志同样可以说明16.30失败3次后不再被访问,30s之后再次被访问一次

enter description here
b、4xx错误测试
16.30返回404,16.28返回200
enter description here
结果说明:
nginx会交替访问16.28和16.30两个机器,即使16.30返回404错误。

2、不配置proxy_next_upstream:
a:5xx错误
16.30返回500,16.28返回200
enter description here
结果说明:
nginx会交替访问16.28和16.30两个机器,即使16.30返回500错误。

b、4xx测试
16.30返回404,16.28返回200
enter description here
结果说明:
nginx会交替访问16.28和16.30两个机器,即使16.30返回404错误。

结论
max_fails:
当客户端访问后端某服务器,”失败的访问”次数达到max_fails时,nginx在一定时间内(这个时间由fail_timeout指定,默认为10s)不会将请求转发给此机器。
这个”失败的访问”的定义包括以下几个方面:
1.error, timeout and invalid_header这三种错误都算作”失败的访问”

  • error: 和后端服务器建立连接时,或者向后端服务器发送请求时,或者从后端服务器接收响应头时,出现错误
  • timeout: 和后端服务器建立连接时,或者向后端服务器发送请求时,或者从后端服务器接收响应头时,出现超时
  • invalid_header:后端服务器返回空响应或者非法响应头
    2.由proxy_next_upstream指令所定义的错误,包括:http_500 | http_502 | http_503 |http_504 | http_403 | http_404
    其中,http_5xx等同于http 5xx错误返回码,http_4xx等同于http 4xx错误返回码。
    1)若在proxy_next_upstream中配置了http_5xx参数,客户端访问后端服务器返回的5xx错误都属于”失败的访问”,达到max_fails定义的次数后,nginx在fail_timeout定义的时间周期内将停止转发请求给此服务器,过了这个时间会重新尝试访问一次,若失败则重复之前的步骤,停止转发请求给此机器。
    2)若proxy_next_upstream中没有配置http_5xx参数,客户端访问后端服务器返回的5xx错误将被忽略,不属于”失败的访问”。
    3)无论proxy_next_upstream中是否配置了http_4xx参数,客户端访问后端服务器返回的4xx错误都不属于”失败的访问”

官方文档解释:

max_fails=number
sets the number of unsuccessful attempts to communicate with the server that should happen in the duration set by the fail_timeout parameter to consider the server unavailable for a duration also set by the fail_timeout parameter. By default, the number of unsuccessful attempts is set to 1. The zero value disables the accounting of attempts. What is considered an unsuccessful attempt is defined by the proxy_next_upstream, fastcgi_next_upstream, uwsgi_next_upstream,scgi_next_upstream, and memcached_next_upstream directives.

fail_timeout=time
sets the time during which the specified number of unsuccessful attempts to communicate with the server should happen to consider the server unavailable;
and the period of time the server will be considered unavailable.
By default, the parameter is set to 10 seconds.

proxy_next_upstream
The directive also defines what is considered an unsuccessful attempt of communication with a server. The cases of error, timeout and invalid_header are always considered unsuccessful attempts, even if they are not specified in the directive. The cases of http_500, http_502, http_503 and http_504are considered unsuccessful attempts only if they are specified in the directive. The cases of http_403and http_404 are never considered unsuccessful attempts.